1,243 research outputs found

    Community Detection with and without Prior Information

    Full text link
    We study the problem of graph partitioning, or clustering, in sparse networks with prior information about the clusters. Specifically, we assume that for a fraction ρ\rho of the nodes their true cluster assignments are known in advance. This can be understood as a semi--supervised version of clustering, in contrast to unsupervised clustering where the only available information is the graph structure. In the unsupervised case, it is known that there is a threshold of the inter--cluster connectivity beyond which clusters cannot be detected. Here we study the impact of the prior information on the detection threshold, and show that even minute [but generic] values of ρ>0\rho>0 shift the threshold downwards to its lowest possible value. For weighted graphs we show that a small semi--supervising can be used for a non-trivial definition of communities.Comment: 6 pages, 2 figure

    Phase Transitions in Community Detection: A Solvable Toy Model

    Full text link
    Recently, it was shown that there is a phase transition in the community detection problem. This transition was first computed using the cavity method, and has been proved rigorously in the case of q=2q=2 groups. However, analytic calculations using the cavity method are challenging since they require us to understand probability distributions of messages. We study analogous transitions in so-called "zero-temperature inference" model, where this distribution is supported only on the most-likely messages. Furthermore, whenever several messages are equally likely, we break the tie by choosing among them with equal probability. While the resulting analysis does not give the correct values of the thresholds, it does reproduce some of the qualitative features of the system. It predicts a first-order detectability transition whenever q>2q > 2, while the finite-temperature cavity method shows that this is the case only when q>4q > 4. It also has a regime analogous to the "hard but detectable" phase, where the community structure can be partially recovered, but only when the initial messages are sufficiently accurate. Finally, we study a semisupervised setting where we are given the correct labels for a fraction ρ\rho of the nodes. For q>2q > 2, we find a regime where the accuracy jumps discontinuously at a critical value of ρ\rho.Comment: 6 pages, 6 figure

    Analysis of Three-Dimensional Protein Images

    Full text link
    A fundamental goal of research in molecular biology is to understand protein structure. Protein crystallography is currently the most successful method for determining the three-dimensional (3D) conformation of a protein, yet it remains labor intensive and relies on an expert's ability to derive and evaluate a protein scene model. In this paper, the problem of protein structure determination is formulated as an exercise in scene analysis. A computational methodology is presented in which a 3D image of a protein is segmented into a graph of critical points. Bayesian and certainty factor approaches are described and used to analyze critical point graphs and identify meaningful substructures, such as alpha-helices and beta-sheets. Results of applying the methodologies to protein images at low and medium resolution are reported. The research is related to approaches to representation, segmentation and classification in vision, as well as to top-down approaches to protein structure prediction.Comment: See http://www.jair.org/ for any accompanying file

    Why Do Cascade Sizes Follow a Power-Law?

    Full text link
    We introduce random directed acyclic graph and use it to model the information diffusion network. Subsequently, we analyze the cascade generation model (CGM) introduced by Leskovec et al. [19]. Until now only empirical studies of this model were done. In this paper, we present the first theoretical proof that the sizes of cascades generated by the CGM follow the power-law distribution, which is consistent with multiple empirical analysis of the large social networks. We compared the assumptions of our model with the Twitter social network and tested the goodness of approximation.Comment: 8 pages, 7 figures, accepted to WWW 201

    Statistical Mechanics of Semi-Supervised Clustering in Sparse Graphs

    Full text link
    We theoretically study semi-supervised clustering in sparse graphs in the presence of pairwise constraints on the cluster assignments of nodes. We focus on bi-cluster graphs, and study the impact of semi-supervision for varying constraint density and overlap between the clusters. Recent results for unsupervised clustering in sparse graphs indicate that there is a critical ratio of within-cluster and between-cluster connectivities below which clusters cannot be recovered with better than random accuracy. The goal of this paper is to examine the impact of pairwise constraints on the clustering accuracy. Our results suggests that the addition of constraints does not provide automatic improvement over the unsupervised case. When the density of the constraints is sufficiently small, their only impact is to shift the detection threshold while preserving the criticality. Conversely, if the density of (hard) constraints is above the percolation threshold, the criticality is suppressed and the detection threshold disappears.Comment: 8 pages, 4 figure
    corecore